                               CHAPTER 2
                         HOW DECTALK PC WORKS.
                                   
This chapter describes how DECtalk PC converts ASCII data into voice
output.

Accepting text and commands.
The  operation  of  the DECtalk PC is dependent  on  its  own  memory
resident  Terminate  and Stay Resident (TSR)  program.  This  program
accepts ASCII text and commands from an application program, such  as
a  Screen  Access program. It then parses the commands  and  in  turn
sends  the appropriate commands and text to the DECtalk PC.  It  also
keeps  track  of  the  status  of  the  DECtalk  board  and  provides
information back to the application program such as replies to  index
queries, etc.

CONVERTING TEXT TO SPEECH.
The  text  and commands sent to the DECtalk by the TSR  are  used  to
convert  the text into  speech by a three-level process.The  commands
control  aspects  of  the DECtalk such as the  voice,  speaking  rate
volume, etc.

LEVEL 1.
DECtalk  first accepts text from the  PC and converts the  text  from
one  code  into another. The text  is in ASCII format when it  enters
DECtalk, and is  converted to phonemic code for further processing.
Phonemic code uses the phonemic alphabet described below. Each symbol
in the phonemic alphabet has only one pronunciation.  DECtalk uses an
internal dictionary and the rules of English pronunciation to perform
this conversion.

LEVEL 2.
The  phonemic code is converted into synthesizer control  parameters.
These  are continuous variables which control  aspects of the  speech
such  as  pitch,  amplitude, duration and the like  for  the  various
DECtalk voices.

LEVEL 3.
The  speech  synthesizer uses the control parameters  to  generate  a
speech  waveform.  This waveform is converted  to  an  analog  speech
signal by a D/A converter.

In  Levels 2 and 3, a synthesizer control command (a set of  phonetic
parameters)  is  generated  every 6.4 milliseconds, and  the  digital
signal  processor  generates  a  speech  waveform  value  every   100
microseconds.  This  process  generates "frames" of  speech.  DECtalk
acts  somewhat  like  a  TV  picture in these frames  of  speech  are
presented  to the  listener just as frames of pictures are  presented
to  the  viewer.   In  both  cases,  the  frames  appear  to  be  one
continuous, unbroken  sequence.

DECtalk PC SOFTWARE PROGRAM.
The  three-level process described in the last section happens  in  a
number of discrete program modules. Each module is described  briefly
in the following paragraphs.

CONVERTING ASCII TEXT TO PHONEMIC CODE.
1.  A sentence parser breaks the input stream into separate words and
locates  some  clause  boundaries (indicated  by   commas  and  other
punctuation  marks  as well as special words).  The  sentence  parser
also  recognizes and deals with phonemic symbols and   commands  that
you may have added to the input text. Phonemics are discussed below.

2.  A  word parser breaks words into their component parts,  yielding
words in their final pronounceable form. Strings of text that do  not
form pronounceable English words are  spelled out letter by letter. A
number  formatter is used if the text contains numerals.  The  number
formatter knows the rules for many common number formats and converts
the numbers into English  words. The number formatter also recognizes
many  common  abbreviations, such as "lb."  for  "pound(s)."  Number-
speaking rules are discussed later in this manual.

3.  A   dictionary   lookup  routine  searches  the   pronunciation
dictionaries. DECtalk has a built-in dictionary of many commonly used
words.   DECtalk  also has definable dictionaries for developers  and
users that can be filled with words specific to an application. These
dictionaries and their loading are described in Chapter  Five.  While
this  version of DECtalk has greater pronunciation accuracy than  its
predecessors, it may sometimes be necessary  to send the DECtalk card
the   correct   phonemics  for  words  important  for  a   particular
application.   This   can  be  done  using  the   developer-definable
dictionary. Dictionaries are searched in the order in which they have
been loaded onto the DECtalk board. The last dictionary loaded is the
first one looked at and so on. The **In what order???LIFO?

4. A letter-to-sound module uses a set of English pronunciation rules
to  assign  phonemic form and lexical  stress patterns to  words  not
found  in  the  dictionary. Chapter 4 describes the  text  processing
rules  for  DECtalk including numbers, abbreviations,  word  spellout
strategies and homographs.   See Chapter 5, Phonemics and Voices  for
ways  to  modify the phonemic form of words, and subsequent  chapters
for special voice qualities (such as  emphasis and singing).

5.  A phrase structure module recombines all phonemic output from the
dictionary search and other modules. Durations  of phonemes and pitch
commands are computed for the  clause, and appropriate sound variants
are  selected for  those phonemes that can be pronounced in more than
one  way.

CONVERTING PHONEMIC CODE TO SYNTHESIZER CONTROL COMMANDS.
6.  The  phoneme-to-voice module processes clauses  passed  from  the
phrase structure module and converts them to control  signals for the
speech synthesizer. This module modifies the clauses by changing  the
phonemes/allophones  into  parameters  that  determine  the   natural
resonant  frequencies  of the vocal tract  (formants),  sound  source
amplitudes,  and  the like. The control parameters are  sent  to  the
speech synthesizer for output.

Converting Control Commands to Speech.
7.  The  digital speech synthesizer computes a speech  waveform  with
acoustic  characteristics  that are determined  by  the   synthesizer
control commands received. Replicating the sounds of human speech  in
a  natural  way is an extremely difficult task.  Dozens  of  acoustic
parameters  and thousands of values of these parameters  have  to  be
taken  into  account.   DECtalk is widely believed  to  be  the  best
English  language synthesizer available anywhere for  intelligibility
and naturalness.

End of Chapter 2.
    
